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1 Introduction 



The lattice provides a regularization for QCD, which allows us to study also non- 
perturbative aspects of the theory from first principles. In order to implement a lattice 
regularization, a non-vanishing lattice spacing a has to be introduced. To achieve the 
goal of making contact with the physical world, it is then unavoidable that results ob- 
tained, e.g. by numerical simulations, have to be extrapolated to zero lattice spacing in 
order to reach the continuum limit. The rate with which the continuum limit can be 
approached will then depend on the amount of contamination of the results by lattice 
spacing effects. 

A systematic approach to reduce discretization errors is Symanzik's improvement 
programme for on-shell quantities JlJ — . Applying this programme for Wilson fermions, 
it turns out that for cancelling the 0(a) effects it is sufficient to add only one new 
term into the action as suggested by Sheikholeslami and Wohlert Q. The coefficient 
c sw multiplying this term then has to be tuned in such a way that no 0(a) effects 
remain in on-shell quantities. In order to achieve this, it is necessary to determine c sw 
non-perturbatively by imposing suitable so-called improvement conditions pHlC|. For 
a complete cancellation of all 0(a) effects, also the improvement of various composite 
fields have to be performed, which introduces a not too large number of additional 
improvement coefficients. 

This programme was successfully applied in the quenched approximation and the 
effects of improvement have been verified [p~0|— p~6f] . Since systematic uncertainties caused 
by lattice artifacts are strongly suppressed in the complete 0(a) improved theory, it is 
expected that simulations can be done at larger lattice spacing, reducing in this way 
their cost substantially. This point of view should hold in particular if we switch from 
the quenched approximation to the full theory, including also the effects of Nf flavors of 
dynamical fermions. Since simulations with Nf ^ are much more expensive than ones 
with Nf = 0, we expect especially in this situation a considerable gain from performing 
a non-perturbative 0(a) improvement. In this paper we therefore will initiate the non- 
perturbative computation of the improvement coefficients starting by determining c sw 
for Nf = 2 dynamical flavors of Wilson fermions. A short account of our work has 
already appeared in [17|. 

We emphasize again that, although we expect to be able to accelerate the approach 
to the continuum limit by determining the improved theory, nevertheless the extrapo- 
lation to zero lattice spacing has to be performed. Indeed, our results discussed below 
indicate that higher order corrections can still be non-negligible. 



2 Determination of c sw 

Our determination of c sw closely follows f8[|n]]. In these references on-shell 0(a) im- 
provement and the use of the Schrodinger functional in this context are also thoroughly 
explained. In order to make the present paper reasonably self-contained, we will intro- 
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duce the improvement condition explicitly and outline the computation of c sw . Unex- 
plained notation is taken over from P,|lC[|. 

2.1 0(a) improved QCD 

We start from Wilson's formulation of lattice QCD flj~8|| . The action is the sum of the 
usual plaquette terms and the quark action 

S F = a i Y^( x )( D + ^{x), (2.1) 

X 

where a denotes the lattice spacing. The Wilson-Dirac operator, 

^ = H( V m + V m )7m-«V*V m }> ( 2 - 2 ) 

contains the lattice covariant forward and backward derivatives, and V^. Energy 
levels and on-shell matrix elements computed with this action approach their continuum 
limits with a rate that is asymptotically linear in the lattice spacing. These leading 
linear terms may be cancelled by adding one improvement term to the Wilson-Dirac 
operator $T^j): 

ia 
T 



-^impr — D ~\~ C sw O ^ v Ffj_i/ , (2-3) 



where F^ v is the standard discretization of the field strength tensor H . The coefficient 
Csw (so) is a function of the bare gauge coupling g$ and, when it is properly chosen, 
it yields the on-shell 0(a) improved lattice action, which was first proposed by Sheik- 
holeslami and Wohlert QQ. 

When considering matrix elements of local operators, their improvement has to be 
discussed as well. Here, we only need the improved isovector axial current, 

{A x )l = Al + ac^dl + d^, (2.4) 

where 

Al(x) = ^(x) 7m75 ^(x), (2.5) 

P a (x) = V^) 75 ^V(x), (2-6) 

and d*^ denote the standard forward and backward lattice derivative and the Pauli- 
matrices r a act on the flavor indices of the quark fields. The pseudo-scalar density P 
needs no 0(a) improvement term, whereas for the axial current one introduces ca as 
another improvement coefficient. Current and density defined above are not renormal- 
ized, but their multiplicative renormalization is of no importance in the following. The 

1 For completeness we note that special care has to be taken, when massless renormalization schemes 
are used. This issue is discussed in ref. but is not of immediate relevance here, where we want to 
perform a non-perturbative computation of c sw (go) for Nf = 2. 
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coefficients c sw and ca are functions of the bare coupling but do not depend on the quark 
mass. Their perturbative expansion is known to 1-loop accuracy [|19|,|9|], in particular 

c sw = l + c«<? 2 + O(^), C W = 0.2659(1). (2.7) 

Non-perturbatively, c sw and ca can be computed by imposing suitable improvement 
conditions. 



2.2 The improvement condition 

The general idea for formulating an improvement condition to fix the 0(a) counter- 
term in the action is that the 0(a) terms violate chiral symmetry. Hence chiral Ward 
identities are violated at non-vanishing values of the lattice spacing. In particular, the 
unrenormalized PC AC relation 

\(d, + eft{{Ai)l(x) O) = 2m(P a (x) O) (2.8) 

contains an error term of order a in Wilson's original formulation, which is reduced to 



0(a 2 ) in the improved theory. Eq. (2.8) can be taken to define a bare current quark 
mass m. Depending on the details of the correlation functions, such as the choice of the 
kinematical variables 0, the position x and boundary conditions, one obtains different 
values of m. These differences are of order a in general and are reduced to 0(a 2 ) by 
improvement. Requiring m to be exactly the same for three choices of the kinematical 
variables allows us to compute the improvement coefficients c sw and c\. 

In more detail, we now consider the Schrodinger functional [20-22] with boundary 
conditions 

U(x,k)\ XQ=0 = exp(aC fc ), C k = ^diag (-tt, 0, tt) (2.9) 
U(x,k)\ XQ=T = exp{aC' k ), C' k = ^diag (-5vr, 2vr, 3vr) (2.10) 

for the gauge fields and boundary conditions for the quark fields as detailed in ref. flOfl 
(taking 8 = 0). For O we choose 

O a = « 6 EC(y)75yC(z), (2.11) 

where Q (() are the "boundary (anti) quark fields" || at time xo = 0. Similarly we use 

0' a = a^'{Y)l^a^ (2.12) 
y 5 z 



with the "boundary fields" at xq = T. Eq. (2.8) then leads us to consider the correlation 
functions 

f A (x ) = -\{Al(x)O a ), f P (xo) = -\(P a {x)O a ) (2.13) 
f' A (T-x ) = +UA a (x)O' a ), MT-xo) = -\{P a {x)0' a ), (2.14) 
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and a current quark mass is given by 

m(x ) = r(x ) + c A s(x ), (2.15) 

where 

r(x ) = 1(3*0 + d )f A (x )/f P (x ), (2.16) 
s(x ) = lad*,d o f P (x )/f P (x ). (2.17) 

Another mass m' is similarly defined in terms of the primed correlation functions. Im- 
provement conditions may be obtained, e.g. by requiring m = m! for some choice of 
xq. In order to obtain an improvement condition that determines c sw , it is, however, 
advantageous to first eliminate ca, which is unknown at this point. To this end one 
observes that the combination 

M(x ,yo) = m(x ) - s(x )—— — — (2.18) 

s {yo) ~ s (yo) 

is independent of ca, namely 

M(x ,y ) = r(x ) - s(x ) ^ ~ ^ . (2.19) 

s{yo) - s '(yo) 



Furthermore, from eq. ( 2.18 ) one infers that M coincides with m up to a small correction 
of order a 2 (in the improved theory); M may hence be taken as an alternative definition 
of an unrenormalized current quark mass, the advantage being that we do not need to 
know ca to be able to calculate it. 

Now we define M' in the same way as M, with the obvious replacements. It follows 
that (amongst others) the difference 

AM = M(jT, jT) — M'(|T, jT) (2.20) 

must vanish, up to corrections of order a 2 , if c sw has the proper value. This coefficient 
may hence be fixed by demanding 

AM = AM {0 \ (2.21) 

Here AM^°\ the value of AM at tree-level of perturbation theory in the 0(a) improved 
theory, is chosen instead of zero, in order to cancel a small tree- level 0(a) effect in c sw . 
In this way one ensures that the values for c sw determined non-perturbatively approach 
exactly one when go — > 0. To complete the specification of the improvement condition, 
we choose L/a = 8, T = 2L and evaluate eq. ( 2.21 ) for quark mass zero, i.e. M = 0. 
(We use M without arguments to abbreviate M(^T, jT) from now on, while AM as 
defined in eq. (2.20) refers to xq = |T.) 

The small tree-level lattice artifact in eq. ( |2.21| ) is then evaluated to 



a AM (0) 



= 0.000277 at L/a = 8. (2.22) 

M=0,c sw =l 



Note that in the Schrodinger functional it is possible to set the quark mass to zero, 
since there is a gap in the spectrum of the Dirac operator of order 1/T. 
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2.3 Numerical results for c= 



For a range of bare couplings go we want to solve eq. ( 2.21| ) for c sw . The general 



numerical procedure (which was used in |T(|) to achieve this is summarized as follows. 

i) For fixed parameters go, c sw and a few suitably chosen values of the bare quark 
mass, compute M and AM and interpolate linearly in M to find AM at M = 0. 

ii) At fixed go, repeat i) for a few values of c sw and find the value of c sw that solves 
eq. fl2~2ll ) by a linear fit m c sw . 



iii) Repeat i) and ii) for sufficiently many values of go to be able to find a good ap- 
proximant c sw (go) for the range of go that is of interest. 

Since we are now interested in the theory with two flavors of dynamical fermions, the 
calculation of M and AM for each choice of mo, go, c S w requires a separate Monte Carlo 
calculation. We did these calculations with the Hybrid Monte Carlo algorithm. Details 
of the simulations and our error analysis are discussed in Sect. |5[ Here we note that these 
simulations are CPU-time intensive and it is therefore desirable to limit the number of 
simulations to be performed. We achieved this by a slight modification of i) and ii). 

First of all, it was already found in the quenched approximation that AM is a very 
slowly varying function of M |l0| . Only negligible errors (compared with the statistical 
ones) are introduced if one keeps M just close to zero, say \aM\ < 0.03, instead of exactly 
zero. Of course, we must be careful when generalizing from the quenched approximation 
since there the quark mass enters only through the quark propagator (valence quark 
mass), while here the quark mass is present in the fermion determinant as well (sea 
quark mass). For one set of parameters c sw ,go, we have therefore verified that also in 
the full theory AM depends only weakly on M. This is shown in Fig. |l[ Note that 
go is chosen relatively large, in order not to be in the situation where quark loops are 
trivially suppressed. Apart from this test, we have verified for each pair of Cs W ,go that 
the dependence of AM on the valence quark mass at fixed sea quark mass is much 
smaller than the statistical uncertainty of AM, even when the valence quark mass is 
increased to values of order 0.05/a. From here on we therefore use AM for \aM\ < 0.03 
as estimates for AM at M = rather than performing several runs and interpolating 
to M = 0. Note that most of our data for M , given in Table 0, are in fact much smaller 
than our bound \aM\ = 0.03. Apart from the test just mentioned, we do of course 
always have valence and sea quarks with the same mass. 

Next, let us discuss step ii). In particular, we want to show that it is not really 
necessary to perform calculations for several values of c sw for each value of the bare 
coupling go- Let us denote by c]^ pr (go), the desired value of c sw for which the improve- 
ment condition is satisfied. For c sw close to Cg™ pr (<?o), AM will depend linearly on c sw 
and the lattice artifact may be written as AM — AM' ' = uj ■ (c sw — c s ™ pr ), with a 



slope u(go) dependent on the gauge coupling. The numerical procedure adopted in [1C] 



consists of fitting AM to this linear dependence separately for each value of the bare 



5 



< 



0.004 



0.002 



- 



-0.002 




Figure l: Mass dependence of the lattice artifact AM at f3 = Q/g^ = 5.4, c sw = 1.7275. 



coupling <7o- To improve on this, we may use the fact that the slope u)(go) is expected 
to be a smooth function of go- Indeed, the numerical values of uj as determined in the 
quenched approximation show that uj(go) can well be described by a linear behavior in 



So, 



-0.015- (1 + wiflg) 



(2.23) 



with a value of uj\ small such that uj(go) does not differ much from the tree-level value 
uj(0) = —0.015. This holds also for our results in full QCD as may be inferred from 
Table |l|. It is therefore not necessary to determine uj for each value of go separately. 
Instead, we use its smoothness to parametrize uj by an effective first order dependence 
on <7q, and perform one global fit to all our data of AM of the form 



aAM - 0.000277 = uj(g ) ■ (c s 



„impr 



(go))- 



(2.24) 



The fit parameters here are uj\ and the desired values Cg™ pr (gQ ) at the different points 
g where we have data. As an aside we remark that the fit to the Nf = 2 data, 
u = —0.015 • (1 — 0.33gg), describes the slopes uj also for Nf = 0. 

In this way, we obtain the values Cg™ pr (go^) that satisfy our improvement condition. 
They are shown as data points in Fig. |2[ In the whole range of go, they are well 
parametrized by 



1 - 0.454c/, 



0.175^ + 0.012^ + 0.0455, 



1 - 0.720^ 



(2.25) 



This representation, shown by the full curve in Fig. ^, is the main result of our work. 
It should be taken as a definition of the improved action for future work. In this way it 
is guaranteed that observables in the improved theory are smooth functions of the bare 
coupling and extrapolations to the continuum limit can be performed. 
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Table l: Results for the current quark mass M and the lattice artifact AM. 



P 


K 




aM 


aAM 


12.0 


0.12981 


1.1329500 


-0.0102(2) 


-0.0007(2) 


12.0 


0.12981 


1.1829500 


-0.0031(1) 


-0.0002(1) 


1Z.U 


U.lzyol 


l.zozyoUU 


U.UUoo^l J 


n nnn/i fi \ 
U.UUU4(1 J 


9.6 


0.13135 


1.2211007 


-0.0031(2) 


0.0000(1) 


7.4 


0.13460 


1.2155946 


-0.0008(5) 


0.0017(3) 


7.4 


0.13396 


1.2813360 


0.0037(3) 


-0.0001(3) 


7.4 


0.13340 


1.3445066 


0.0050(3) 


-0.0002(4) 


7.4 


0.13245 


1.4785602 


—0.0002(5) 


—0.0022(4) 


6.8 


0.13430 


1.4251143 


0.0014(4) 


0.0000(3) 


6.3 


0.13500 


1.5253469 


0.0013(6) 


-0.0004(4) 


6.0 


0.13910 


1.2659000 


0.0087(7) 


0.0018(7) 


6.0 


0.13640 


1.5159000 


0.0025(7) 


-0.0002(6) 


6.0 


0.13330 


1.7659000 


0.0108(5) 


-0.0014(6) 


5.7 


0.14130 


1.2798947 


0.005(1) 


0.0055(9) 


5.7 


0.13770 


1.5569030 


0.004(1) 


0.0007(7) 


5.7 


0.13410 


1.8339110 


0.0045(6) 


-0.0016(5) 


5.4 


0.14360 


1.3571728 


0.023(3) 


0.004(4) 


5.4 


0.13790 


1.7275432 


0.009(1) 


0.0003(9) 


5.4 


0.13250 


2.0979135 


0.007(2) 


-0.0016(8) 


5.2 


0.13300 


2.0200000 


0.123(4) 


-0.0006(9) 



As in the quenched approximation, the Nf = 2 result is well approximated by 
perturbation theory (eq. ( |2.7[ )) for small couplings, say g$ < 0.5. For larger couplings 
it grows quickly, although not quite as steeply as in the quenched approximation (the 
dashed curve). 

An important issue is the question for which range of couplings eq. (2.25) is ap- 
plicable. A priori it is to be trusted only for j3 ^ 5.4, where c sw was computed by the 
numerical simulations. Extrapolations far out of this range are dangerous. We did, 
however, investigate whether it is justified to use eq. ( 2.25| ) at somewhat smaller (3. 
Unfortunately, already for (3 ~ 5.2 the numerical simulations close to M = turned 
out to be too time consuming for our 256 node APE-100 computer providing 6Gflop/s 
(sustained). What helps again is that AM hardly depends on M. Therefore we expect 
to find a small value of AM also at larger values of M, say \aM\ < 0.15 (see Fig. ||), 
if the action is properly improved. Our calculation at (5 = 5.2 and aM ~ 0.12 yielded 
aAM = —0.0006(9), indicating that our improvement condition is indeed satisfied for 
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Figure 2: Non-perturbatively determined improvement coefficient c sw for Nf = 2. The 
dotted line shows first order perturbation theory and the dashed curve is the 

result for N f = Q. 

c sw as given by eq. (|2.25|) for f3 as low as (3 = 5.2. However, this calculation also re- 
vealed that higher order lattice artifacts rapidly become stronger when j3 is taken below 
(3 = 5.4. We will return to this issue in Sect. f|. 

3 Estimate of n c 

For future applications of the improved action, it is useful to roughly know the position 
of the critical line 

K = K c (g ), K= 2(^+4) , (3.1) 

which is defined by the vanishing of the current quark mass. We will give an estimate 
for n c (go) in this section. The critical line eq. (|3.1| ) has an intrinsic uncertainty, since 
the position where the current quark mass vanishes depends on the very definition of 
the current quark mass. The uncertainty in the current quark mass is 0(a 2 ), translating 



8 



0.14 - 



-i — i — i — i — r 



~\ — i — I — i — i — i — I — r 



0.135 - 



0.13 - 



0.125 



■■-r^i i I i 

0.2 



0.4 



0.6 



0.8 



1.2 



Figure 3: The critical line in the improved theory. The dashed curve gives the polynomial 
approximation to the non-perturbative result and the dotted line indicates first order 
perturbation theory. 



into an 0(a 3 ) uncertainty in k c . From Fig. 9 in ref. [10], we estimate that the values of 
n c that one determines on a lattice with L/a = 8, might differ from k c determined on 
larger lattices by as much as 2 x 10~ 4 . This has to be kept in mind as an important 
limitation of the present determination of k c . 

Our basis for an estimate of k c are the numerical data for M at a number of values 
of the parameters k, go, c sw . For (3 > 5.4 the values of aM are rather small, see Table |]. 
One can convince oneself easily that it is justified in this case to use the 1-loop relations 



M 



Z m m q (1 + bam q 



1 1 

qj, am q = 

b = -1/2 - 0.0962 • ^g, Z m = 1 + 0.0905 • g%, 



(3.2) 
(3.3) 



to determine k c from k, aM. The uncertainties due to left out higher order terms in 
the above equations may be neglected compared to the statistical uncertainties in M, 
since k c is close to k in any case. As mentioned earlier, for (3 = 5.4, c sw = 1.7275, we 
have a series of different values of k. We have investigated whether the 1-loop relations 
describe M(k) in this case as well. Up to aM ~ 0.14 and within an error margin of 
about 5%, this is shown to be the case by our data. Therefore we also included the 
point = 5.2, aM ~ 0.12 in the analysis - despite the relatively large mass of that 
point. The statistical uncertainties in aM are then translated into uncertainties in k c . 

Next, we need to interpolate k c in c sw to the proper values given by eq. ( [2. 25 ). 
These are denoted by c™ pr (<?o) below. In this interpolation, we use again that the slope 
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of function of c sw is a smooth function of oq. We fit all values of k c to 

«c = 4 mpr (5o) + M5o)-(csw-C pr (9o)), (3.4) 
% ) = j^Hg Q f\ (3.5) 



8=1 



Here, k\ is set to its perturbative value of —0.053/8 [0], while &2,&3 and the desired 
values fit parameters. Eq. (|3.4|) fits all values of k c within an error margin 



of 2 • 10~ 4 , which is also roughly the statistical accuracy of k c . 

The results of the fit, Kj. mpr (<7o), are shown as data points in Fig. ||. They are well 
approximated by the polynomial (dashed curve), 

Kc = l/8 + 4 1} 3o + °- 008 %o -°- 0272 3o + - 042 ^ ( 3 - 6 ) 
4 1 ) = 0.008439857 . (3.7) 



The critical line is never very far from the 1-loop result k c = 1/8 + Kc <7o [ 25 , 19 . 

We emphasize once more that we regard eq. ( |3.6| ) as a first estimate and expect 
only a rather crude precision (statistical + systematic) of order ±3 • 10 . 

4 0(a 2 ) effects after improvement 

Once the improved action is known up to O(a), a new question arises immediately: how 
large are higher order lattice artifacts after improvement? In the quenched approxima- 
tion this has been investigated in the Schrodinger functional and also for low-energy 
hadronic observables [1^,13,14,15,16]. Only rather small 0(a 2 ) effects have been found. 



In the full theory, such investigations will still take some time. Since our observable 
M(xo, T/4) should not depend on xq, apart from the 0(a 2 ) effects, it might serve mean- 
while as estimator of the higher order lattice artifacts. 

We plot aM(x , T/4) and aM'(x ,T/4) in the lower part of Fig. |, for (3 = 5.4 and 
a value of c sw where AM almost vanishes as is obvious from the agreement between M 
and M' . Results for two values of k are shown. They correspond to rather different M. 
At both values of M we observe that the variations of aM(xQ,T/4) and aM'(xo,T/A) 
are within a corridor of ±0.003 as long as 4a < xo < T — 4a. As one takes xq closer to 
the boundaries, the values of M drop significantly below the values of M belonging to 
the corridor. 

The middle part of the figure shows the situation for f3 = 5.2. Here, xq = 4a is 
already outside of a corridor of the same size and xq = 3a is quite far below. This indi- 
cates that at (3 = 5.2 the 0(a 2 ) lattice artifacts are already quite significant. Of course, 
their impact on quantities such as the hadron spectrum remains to be investigated in 
detail. Nevertheless, our analysis indicates that for values of (3 below (3 = 5.2, O(a) 
improvement ceases to be very useful. 
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Figure 4: M and M' for /3 = 5.4, c sw = 1.7275 (bottom part of the figure) and (5 
o.z, Cg W as given by eq. ( ggp (middle). The very top section is for j3 — 6.0, Cg W 
1.8659, Nf = 0. The time extent of the lattice is T = 16a. 



The above conclusion is further strengthened by a comparison with the quenched 
approximation (top part of the figure) where - at a relatively large lattice spacing of 
a ss O.lfm - the xo-dependence is hardly visible. 

In interpreting the lattice artifacts in M(xq,T/4), one must be careful about the 
following point. Close to the boundaries, xq = and xq = T, the spectral decomposition 
of the correlation functions, /, /', receives noticeable contributions also from intermedi- 
ate states with energies of the order of the cutoff. In such a kinematical regime, on-shell 
improvement is not applicable. We should therefore not put too much weight on the 
behavior very close to the boundaries. (For this reason we are not showing the points 
xo = a and xq = T — a va. the figures.) Even with this reservation in mind, the difference 
between the quenched approximation at a ~ O.lfm and JVf = 2 QCD at /3 = 5.2 is 
striking. 

We further note that preliminary results of the UKQCD collaboration 26] indicate 
that the lattice spacing is larger than O.lfm at /3 = 5.2, Nf = 2. This is in line with the 
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considerable size of lattice artifacts visible in Fig. ||]. 



5 The Hybrid Monte Carlo simulation 

In this section we want to present a number of aspects of the simulations we have 
performed. All numerical results quoted in this paper have been obtained by using the 
Hybrid Monte Carlo (HMC) algorithm p7| |. One simulation (at = 6.8) was repeated 



with the Polynomial Hybrid Monte Carlo algorithm [28,29J, and completely consistent 
values were found for all observables compared [p0[ . 

Our particular implementation of the HMC algorithm using even/odd precondition- 
ing [31] and including the improvement term of eq. ( |2.3| ) is described in detail in [32]. 



Throughout the simulations we used the higher order leap-frog integrator suggested in 
|33[] to integrate the equations of motion to a trajectory length of one, implementing 



eq. (6.7) of ref. [33] with n = 4. 

The program was run on the Alenia Quadrics (APE) massively parallel machine 
with 256 nodes. We decided to distribute our lattice of size 16 • 8 3 on these machines 
in such a way that we ran N rep = 32 replica in parallel. These replica are independent 
copies of the lattice for identical choices of all parameters (apart from random numbers); 
we end up with iV re p statistically independent simulations for each set of parameters. 

When starting our simulations, we tried to keep the acceptance rate to about 90%. 
However, it happened rather frequently that, despite this relatively large acceptance 
rate, in one of the replica a considerable number of trajectories in a row were rejected, 
inducing substantial autocorrelation times. Our solution to overcome this problem, 
was to perform every iV sa f e trajectories one with a much smaller step size, which we 
call the "safety step" . In principle, iV sa f e as well as the corresponding step size can be 
drawn randomly from arbitrary distributions, while still preserving detailed balance of 
the HMC algorithm. The choice of N sa £ e and the step size must, however, not depend 
on the Monte Carlo history. For simplicity we chose fixed values for iV sa f e and the 
step size before each run, determining reasonable values from our experience gained in 
simulations at other /3's and also during the thermalization phase. 

We give in Table || the most relevant parameters of our simulations. By iVt ra j 
we denote the number of trajectories that equals the number of measurements per 
replicum. The strategy in performing the simulations was to start at a high value of 
(3 = 12 and then to decrease /3 successively to the values given in Table ||. We found, 
when changing from one (3 to the next smaller value, that only a short thermalization 
time of the order of ten trajectories was required. 



5.1 Dynamics of the HMC — cost of a simulation 

To achieve our physics goal, we performed quite a large number of simulations with dif- 
ferent values of the parameters k, f3 and c sw . Although this was not our main objective, 
it is natural to also attempt to learn something about the dynamics of the algorithm. 
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Table 2: The parameters of the runs (we typically had iV sa f e = 6). We denote by 5t the 
step size of the leap-frog integrator, giving in brackets the value of the safety step size. 
P acc denotes the acceptance rate and Nqg the average number of Conjugate Gradient 
iterations per inversion. 
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Before doing so, let us recall the special physical situation where these simulations were 
performed. We kept L/a and T/a fixed, and stayed close to the critical line. In contrast 
to the situation in infinite volume, the infrared cutoff is given by T and the quark mass, 
once small, plays only a minor role for the physics and hence also for the dynamics of 
the HMC algorithm. For the latter, a relevant quantity is the condition number, 

k = ^ , (5.1) 

where A m j n (A max ) are the lowest (largest) eigenvalues of the preconditioned matrix Q 2 , 
see |3^]. This quantity was computed in all simulations, using the method of ref. |3^j35[| . 
For quark mass zero and with Schrodinger functional boundary conditions, k has 



a value of order (T/a) for free fermions [21]. In the interacting theory this is modified 



by terms of order g 2 .. For the MC dynamics, the inverse square root of the condition 
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number is expected to roughly take over the role that the quark mass plays for infinite 
volume simulations. 
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Figure 5: The expectation value of the condition number, eq.(|5.1|). When more than 
one value is plotted at a given value of g\ they correspond to different values of c sw . 
The tree- level value is shown at g 2 = 0. 



In Fig. | we see a strong rise of (k) when g 2 , is increased^} For the largest values 
of the bare coupling, it is a factor three larger than the tree-level value at g 2 , = 0. Of 
course, this is immediately reflected in a considerable rise in the number of CG iterations 
needed for the various "inversions" of Q 2 . In fact, in our particular implementation of 
running iV re p simulations in parallel on a SIMD machine, there is an additional impor- 
tant overhead: the inversions have to run until the "slowest" replicum has converged. 
Therefore the number of CG iterations depends on the maximum of k over the number 
of replica, denoted by k max . Since also the relative variance of k, defined by (A; 2 ) / (k) 2 — 1 
grows from around 0.1 at /3 = 12 to 0.5 at j3 = 5.4, we find that (k max )/(k) can be as 
large as (/c max ) / (k) 4. We have not exactly quantified the corresponding loss in speed 
of the HMC as a function of iV rep , but expect that this may result in an overhead of 
20-40% for iV rep = 32 |. 

In addition to its direct relation to Nqg, the condition number also is an important 
parameter, which has an influence on how large 5t may be chosen for a desired value 
of the acceptance P a cc- One may argue |^6|,^] that - for our leap-frog integrator - P acc 
is approximately a universal function of the combination y = (St) 2 (/c 3//2 ) . We observe 

2 The average is the usual ensemble average. 

3 We remark that the condition number also develops a significant dependence on the quark mass 
when f3 becomes small and/or large quark masses are considered. For example at /3 = 5.4, c sw = 1.7275 
we find (k) = 1871(95) at M = 0.009(1) whereas (k) = 800(14) at M = 0.086(2). 
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Figure 6: The acceptance rate as a function of y = 5r 2 {k 3 ^ 2 ). Only data with \aM\ < 
0.03 are included. 

a rough consistency with such a scaling law (cf. Fig. |6|). 

As already mentioned earlier, the combination of the above effects leads to a large 
increase in the cost of the simulations when g 2 is increased. In the following section 
we will see that autocorrelations (mildly) enhance this effect further. This was one of 
the reasons why we did not decrease (5 below (3 = 5.4, where we invested already of the 
order of a month of CPU-time on our 6 Gflop/s (sustained) machine. Given that the 
length of our lattice might be around 2fm or larger for (3 < 5.4, it is in fact not a great 
surprise that simulations with very light quarks become difficult in this regime. 

5.2 Error analysis and autocorrelation times 

We used two different methods to obtain the errors of our observables. The first one 
is to average observables over all measurements done within one replicum. One is then 
left with iVrep = 32 statistically independent measurements. The errors, computed by 
jack-knife, then have no systematic uncertainty due to autocorrelations but they are 
only subject to a relative statistical uncertainty of (2N rep )~ 1 ^ 2 = 12.5%. 

Alternatively we performed a jack-knife procedure combined with the following 
blocking analysis. We generated an ensemble of blocked measurements by averaging 
(for each replicum) subsequent measurements over blocks of length L block- The blocked 
measurements of different replica were joined to form one common sample (with iVbiock = 
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-/VrepiVtraj/Lbiock blocks) from which we then computed the jack-knife error A(O) of 
the observable O. For large statistics, these errors will have a negligible statistical 
uncertainty, but still suffer from systematic corrections due to autocorrelations. For 
autocorrelation times r, which are small with respect to -Lbiock; the systematic effect due 
to autocorrelations is proportional to r/Lbiocki while the relative statistical uncertainty 



of this error estimate is approximately given by (2N] 



blockj 



-1/2 



A typical situation is 



shown in Fig. ^ for the error of the lattice artifact AM. Since the autocorrelations for 
this quantity are not very large (see below), the error estimates converge quite well as 
-^biock —> -^VrcpQ The most precise estimate of the true error of AM would probably 
be given by an extrapolation to iVbiock = A^ep from larger values of iVbiock- On the 
other hand, there is a systematic (and subjective) bias in such an extrapolation and we 
prefer to quote the unbiased error estimate discussed before, which has a satisfactory 
statistical precision of 12.5% anyhow. 
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Figure 7: The error of the lattice artifact as a function of the number of blocks iVbiock = 
-/Vrep-^Vtraj/^biock- The naive error is A na i ve = 4.5 • 1CT 4 . The data are shown for 
parameters = 5.4, k = 0.1379, c sw = 1.7275432. 

Our definition of the integrated autocorrelation time r; n t of an observable O is 

- (0) - * (ra>) 2 ■ (5 - 2) 

where A na ; ve is the naive error (computed with -Lbiock = !)• Estimating the true error 
as described above, i.e. having -Lbi oc k = ^traj , this quantity has a statistical uncertainty 

4 We then have Lbiock = iVtraj and are back to the first method, where there is no systematic 
uncertainty of the error estimates. A little thought reveals that in general one expects an approximately 
linear behavior in iVbiock — ^V rep as suggested by Fig. [71 
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of approximately A(r; n t) = J 2/N Tep ■ T int . We show in Fig. |8] the integrated autocor- 
relation times for the lowest eigenvalue A m i n (Q 2 ) and for our main observable AM. 
The figure indicates that the autocorrelation times increase with growing bare coupling. 
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Figure 8: The integrated autocorrelation times of the lowest eigenvalue A m i n (Q 2 ) and 
the lattice artifact AM as a function of the bare coupling. When more than one value 
is plotted at a given value of g$, they correspond to different values of c sw . 

Furthermore, the integrated autocorrelation times depend strongly on the observable: 
for the lattice artifact AM, T mi is always much below the one for A m i n (Q 2 )- We note 
that, on the contrary, the integrated autocorrelation times for the correlation functions 
Ja and fp, from which AM is derived, are similar to the ones for X m m(Q 2 )- Even 
larger autocorrelation times for other observables have been observed after cooling. As 
discussed in the following subsection, it is, however, very unlikely that they invalidate 
our error estimate for AM. 



5.3 Metastable states 

As an additional interesting observable we have also monitored the renormalized cou- 
pling constructed as described in |3j| (with the derivative with respect to the background 
gauge field acting only on the pure gauge action). While for the larger values of f3, this 
quantity showed autocorrelation times of the same order as r; nt (AM), we observed a 
considerable rise in the integrated autocorrelation times eq. (|5.2| ) for j3 < 5.4. We were 
worried that this behavior as well as the large values of Ti n t(A m i n ) are due to difficulties 
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Figure 9: Part of the HMC history of the gauge field action Sq after cooling. We show 
three of our 32 replica. Simulation parameters are /3 = 5.4, k = 0.1325, c sw = 2.0979. 
The bounds (5.3) and (|5.4|) are shown as dotted lines. 



of the HMC algorithm to sample different topological sectors as they have been observed 
before in large volume simulations [^] (see also p0[). 

In order to investigate this possibility, we also examined the gauge fields after 
performing a number of cooling iterations [^TJ , computing among other observables the 
gauge-field action and the topological charge Q ("naive definition", see [^H). For our 
abelian background field, the gauge field action satisfies the bounds [^] (a small 0(a) 
correction is neglected here) 

g%S G > tt 2 , Q = 0, (5.3) 
9 2 S G > 8tt 2 \Q\. (5.4) 

We note that eq. ( |5.4| ) is derived for smooth fields in the continuum Schrodinger func- 
tional, while eq. ( |5.3| ) is valid also for rough fields. 

In all our runs where we performed cooling, we observed only once a value of Q 
different from zero. The short Monte Carlo (MC) time interval where this happened 
is contained in the MC history shown in the middle part of Fig. || where g^Sc ~ 80. 
Exactly during the interval where the action is above the limit eq. ( |5.4| ) with \Q\ = 1, we 
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also observe that Q has an (approximately) unit value. Given their rareness, topological 
fluctuations appear not to be relevant in our small volume simulations and there is an 
indication that they are in fact not long-lived. In particular, slow topological fluctuations 
are clearly not the cause of the large autocorrelation times observed in our simulations. 

Most of the time, the action (always after cooling in this section) is close to the 
absolute minimum of eq. (|5.$). However, we also observed longer sections in the MC- 
history, where it remains at other values (e.g. g^Sa ~ 40). These appear to correspond 
to non-trivial local minima of the action (with Q = 0). Since these states are stable 
over several tens of trajectories, there is a mode with a very large autocorrelation time 
in the HMC simulations. 

We now have to investigate whether our observable AM is affected by this large 
autocorrelation time, and the small r; n t determined in the previous section is misleading. 
We therefore want to know the correlation of AM with these states. As a measure of 
such a correlation we consider the linear correlation coefficient Cor (AM, c/qSq)- A stan- 
dard definition of the correlation coefficient of primary observables, a, b, (i.e. observables 
that are given directly as ensemble averages) is Cor(a,6) = Cov(a, b)/[Cov(a, a)Cov(6, b)] 1 / 2 , 
where Cov(a, b) = ((a — (a))(b— (b)) ). Since we are mainly interested in AM, a derived 
quantity (i.e. a function of primary observables), we generalize the above definition to 

Cor(a, b) = [a 2 {ab) - (6) V(a) - (a) V(6)} / {2(a) (b) a (a) a {b)} , (5.5) 

which can easily be used also for derived quantities, by computing the variances o~ 2 (a) 
by a jack-knife analysis. 

We found very small correlation coefficients between Sq and all fermionic ob- 
servables considered, where of course the fermionic observables are the ones that en- 
ter the physics, i.e. they are computed without cooling. As an example we quote 
Cor(AM,sgS G ) = 0.02 for (3 = 5.4, c sw = 2.0979. We conclude that the metastable 
states that we observed are not a matter of major concern for our error analysis in the 
determination of c sw . 



This corroborates our error analysis of Sect. 5.2. 
6 Conclusions 

In this paper we have performed the first step to achieve a non-perturbatively O(a) 
improved lattice theory for Wilson fermions. We have computed the improvement co- 
efficient c sw as a function of f3 = Q/g 2 , in a range of couplings j3 > 5.2. This range of 
couplings seems to cover values of lattice spacings where simulations for e.g. studying 
hadronic properties can be performed. As our main result we consider the parametriza- 
tion of eq. ( p. 25 ), which determines the non-perturbatively improved action for Nf = 2 



dynamical flavors of Wilson fermions. A number of quantities such as the hadron spec- 
trum can now be computed with lattice artifacts starting only at order a 2 and, indeed, 
such a programme has been initiated already . In order to achieve the same accuracy 
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for hadronic matrix elements, also the improvement and normalization of the operators 
have to be determined along the lines exploited already in the quenched approximation 



1 11. 43. 44] or following new suggestions [45]. 



Although after 0(a) improvement the linear lattice artifacts are cancelled, higher 
order discretization errors will remain. Indeed, we found indications that for (3 < 5.4 
these effects can become significant. It would be desirable to investigate these effects 
further for additional physical observables. 

During the course of our small volume simulations, we were also able to study 
the dynamical behavior of the Hybrid Monte Carlo algorithm used throughout this 
work. We found that the condition number of the fermion matrix rises with increasing 
coupling strength. The condition number directly influences various ingredients of the 
algorithm: changing (3 from large values to = 5.4, we found that the number of 
Conjugate Gradient iterations increased by a factor of about 1.6, the step size had to 
be decreased by a factor of 2 and the autocorrelation times increased by about a factor 
of 2. All these effects add up to make simulations very expensive when (3 is chosen to 
be P ~ 5.2 or even smaller. 
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